INTRODUCTION TO DATASET

The dataset being analyzed in this report was sourced from Kaggle.com (https://www.kaggle.com/datasets/mvieira101/global-cost-of-living), which was initially gathered by scraping Numbeo’s website (https://www.numbeo.com). Kaggle is a well-established platform that boasts a vast collection of high-quality and diverse datasets, which have been gathered and worked on by experts in various fields.

Working with uncleaned data is a common representation of real-world challenges in data science and machine learning, as data is rarely clean and ready to use in the real world. This presents an opportunity for data scientists to develop their skills in tackling real-world problems, making it an ideal platform for data analysis. The data in this dataset is updated until December 2022.

The purpose of selecting this dataset is to evaluate the ability to analyze the data and provide meaningful visualizations. The aim is to explore new libraries and functions in R programming, as well as to improve the skillset in data analysis. The context of this database is to gain valuable insights into the global cost of living and to identify any patterns or trends that may exist in the data.

I like to travel and I always work a lot to find the living expenses before planning a trip to the country. So personally these help me a lot to get an idea to find average living expenses in each place and try to find cheap ones. Also, I always get the query from students and professionals regarding studying abroad for their master’s degree. They are very curious about the living cost of the countries and they need to know whether they can manage it during their studies. So I decided to take this data and transform it into a meaningful form. The dataset is about the cost of living in almost 5000 cities across the world. The data were gathered by scraping Numbeo’s website (https://www.numbeo.com).

The dataset consists of 58 columns and initially contains 4956 rows. The columns “city” and “country” are character data types, representing the location of the data. A column “continent” has been added to the dataset, in order to provide a categorical variable for the corresponding “country” column.

The remaining columns in the dataset represent the expenses of various commodities, with the cost values being in US Dollars. However, these columns are not named in a descriptive manner and instead are labelled as “x1”, “x2”, “x3”, and so on. The delimiter used in the dataset is the comma (“,”).

DATA CLEANING METHODOLOGY

This is major part of project since we do not clean and format the data properly,it will affect our results and it give wrong judgments.

The methodology for the cleaning data is mentioned below:

  • First step is to import relevant libraries for cleaning the data
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0      ✔ purrr   1.0.0 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.5.0 
## ✔ readr   2.1.3      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(dplyr) 
library(countrycode)
library(openxlsx) 
library(ggplot2)
library(stringr)
  • Load the actual uncleaned data from the Kaggle.
dataset <- read_csv("D:/R_Studio_class/cost-of-living_v2.csv")
## Rows: 4956 Columns: 58
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (2): city, country
## dbl (56): x1, x2, x3, x4, x5, x6, x7, x8, x9, x10, x11, x12, x13, x14, x15, ...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
  • Selection of relevant columns for the analysis. Here we are using SELECT function to select the required columns.
dataset <- dataset %>% select(city,country, x1,x8,x9,x12,x14,x16,x17,x18,x19,x20,x21,x22,x29,x36,x48)
dataset
## # A tibble: 4,956 × 17
##    city      country    x1    x8    x9   x12   x14   x16   x17   x18   x19   x20
##    <chr>     <chr>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1 Seoul     South …  7.68  0.79  2.2   4.04 10.6   6.77  3.71  6.5   6.19  3.84
##  2 Shanghai  China    5.69  0.33  2.74  2.22  4.86  2.26  1.6   2.19  1.53  0.84
##  3 Guangzhou China    4.13  0.33  1.91  1.71  3.77  2.02  1.44  1.82  1.31  0.74
##  4 Mumbai    India    3.68  0.19  0.75  0.95  3.69  2.09  0.67  1.34  0.59  0.44
##  5 Delhi     India    4.91  0.19  0.73  1.02  3.81  1.79  0.75  1.03  0.61  0.37
##  6 Dhaka     Bangla…  1.95  0.16  0.83  1.32  3.07  2.38  1.04  2.18  0.96  0.28
##  7 Osaka     Japan    7.45  0.81  1.41  1.9   6.93  4.09  1.85  3.87  6.34  3.31
##  8 Jakarta   Indone…  2.59  0.27  1.3   1.71  3.52  2.99  1.6   2.13  1.3   1.42
##  9 Shenzhen  China    4.27  0.34  2.23  2.13  4.37  1.97  1.47  1.81  1.45  1.06
## 10 Kinshasa  Congo   15.1   0.84  2     4.15  5    10     3.25  5.25  6.33  3.6 
## # … with 4,946 more rows, and 5 more variables: x21 <dbl>, x22 <dbl>,
## #   x29 <dbl>, x36 <dbl>, x48 <dbl>
  • Converting the character datatype country vairable to categorical variable.
dataset$country <- as.factor(dataset$country)
  • Renaming the columns in correct and user understandable way.
dataset <- dataset %>% rename(meal_in_restaurant = x1, water_price = x8, Milk_onelitre = x9,Eggs_regular = x12,Chicken_1kg = x14,Apples_1Kg = x16,Banana_1kg = x17,Oranges_1kg = x18,Tomato_1Kg = x19,Potato_1kg = x20,Onion_1kg = x21,Lettuce_1head = x22,Monthly_Travel_Pass = x29,Basic_Amenties_Bill = x36,Single_bedroom_rent =x48) 
head(dataset)
## # A tibble: 6 × 17
##   city   country meal_…¹ water…² Milk_…³ Eggs_…⁴ Chick…⁵ Apple…⁶ Banan…⁷ Orang…⁸
##   <chr>  <fct>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
## 1 Seoul  South …    7.68    0.79    2.2     4.04   10.6     6.77    3.71    6.5 
## 2 Shang… China      5.69    0.33    2.74    2.22    4.86    2.26    1.6     2.19
## 3 Guang… China      4.13    0.33    1.91    1.71    3.77    2.02    1.44    1.82
## 4 Mumbai India      3.68    0.19    0.75    0.95    3.69    2.09    0.67    1.34
## 5 Delhi  India      4.91    0.19    0.73    1.02    3.81    1.79    0.75    1.03
## 6 Dhaka  Bangla…    1.95    0.16    0.83    1.32    3.07    2.38    1.04    2.18
## # … with 7 more variables: Tomato_1Kg <dbl>, Potato_1kg <dbl>, Onion_1kg <dbl>,
## #   Lettuce_1head <dbl>, Monthly_Travel_Pass <dbl>, Basic_Amenties_Bill <dbl>,
## #   Single_bedroom_rent <dbl>, and abbreviated variable names
## #   ¹​meal_in_restaurant, ²​water_price, ³​Milk_onelitre, ⁴​Eggs_regular,
## #   ⁵​Chicken_1kg, ⁶​Apples_1Kg, ⁷​Banana_1kg, ⁸​Oranges_1kg
  • Combining the the columns Apples, Banana, Orange cost column as Fruits column.
dataset <- dataset %>% 
  mutate(Fruits_Avg_Price = (Apples_1Kg + Banana_1kg + Oranges_1kg) / 3) %>% 
  select(-Apples_1Kg, -Banana_1kg, -Oranges_1kg)
head(dataset)
## # A tibble: 6 × 15
##   city   country meal_…¹ water…² Milk_…³ Eggs_…⁴ Chick…⁵ Tomat…⁶ Potat…⁷ Onion…⁸
##   <chr>  <fct>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
## 1 Seoul  South …    7.68    0.79    2.2     4.04   10.6     6.19    3.84    2.92
## 2 Shang… China      5.69    0.33    2.74    2.22    4.86    1.53    0.84    1.04
## 3 Guang… China      4.13    0.33    1.91    1.71    3.77    1.31    0.74    1   
## 4 Mumbai India      3.68    0.19    0.75    0.95    3.69    0.59    0.44    0.44
## 5 Delhi  India      4.91    0.19    0.73    1.02    3.81    0.61    0.37    0.41
## 6 Dhaka  Bangla…    1.95    0.16    0.83    1.32    3.07    0.96    0.28    0.49
## # … with 5 more variables: Lettuce_1head <dbl>, Monthly_Travel_Pass <dbl>,
## #   Basic_Amenties_Bill <dbl>, Single_bedroom_rent <dbl>,
## #   Fruits_Avg_Price <dbl>, and abbreviated variable names ¹​meal_in_restaurant,
## #   ²​water_price, ³​Milk_onelitre, ⁴​Eggs_regular, ⁵​Chicken_1kg, ⁶​Tomato_1Kg,
## #   ⁷​Potato_1kg, ⁸​Onion_1kg
  • Combining the the columns Tomato, Potato, Onion and Lettuce cost column as Vegetables column.
dataset <- dataset %>% 
  mutate(Vegetables_Avg_Price = (Tomato_1Kg + Potato_1kg + Onion_1kg + Lettuce_1head ) / 4) %>% 
  select(-Tomato_1Kg,- Potato_1kg, - Onion_1kg, - Lettuce_1head)
head(dataset)
## # A tibble: 6 × 12
##   city   country meal_…¹ water…² Milk_…³ Eggs_…⁴ Chick…⁵ Month…⁶ Basic…⁷ Singl…⁸
##   <chr>  <fct>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
## 1 Seoul  South …    7.68    0.79    2.2     4.04   10.6    42.2    182.     743.
## 2 Shang… China      5.69    0.33    2.74    2.22    4.86   28.5     66     1092.
## 3 Guang… China      4.13    0.33    1.91    1.71    3.77   28.5     59.6    533.
## 4 Mumbai India      3.68    0.19    0.75    0.95    3.69    4.91    43.6    522.
## 5 Delhi  India      4.91    0.19    0.73    1.02    3.81   11.7     58.1    230.
## 6 Dhaka  Bangla…    1.95    0.16    0.83    1.32    3.07   18.5     37.1    142.
## # … with 2 more variables: Fruits_Avg_Price <dbl>, Vegetables_Avg_Price <dbl>,
## #   and abbreviated variable names ¹​meal_in_restaurant, ²​water_price,
## #   ³​Milk_onelitre, ⁴​Eggs_regular, ⁵​Chicken_1kg, ⁶​Monthly_Travel_Pass,
## #   ⁷​Basic_Amenties_Bill, ⁸​Single_bedroom_rent
  • Adding continent column using countrycode library with mapping country column. Converting the datatype of continent to categoroical datatype.
dataset$continent <- countrycode(dataset$country, origin = "country.name", destination = "continent")
## Warning in countrycode_convert(sourcevar = sourcevar, origin = origin, destination = dest, : Some values were not matched unambiguously: Kosovo (Disputed Territory)
dataset$continent <- as.factor(dataset$continent)
head(dataset)
## # A tibble: 6 × 13
##   city   country meal_…¹ water…² Milk_…³ Eggs_…⁴ Chick…⁵ Month…⁶ Basic…⁷ Singl…⁸
##   <chr>  <fct>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
## 1 Seoul  South …    7.68    0.79    2.2     4.04   10.6    42.2    182.     743.
## 2 Shang… China      5.69    0.33    2.74    2.22    4.86   28.5     66     1092.
## 3 Guang… China      4.13    0.33    1.91    1.71    3.77   28.5     59.6    533.
## 4 Mumbai India      3.68    0.19    0.75    0.95    3.69    4.91    43.6    522.
## 5 Delhi  India      4.91    0.19    0.73    1.02    3.81   11.7     58.1    230.
## 6 Dhaka  Bangla…    1.95    0.16    0.83    1.32    3.07   18.5     37.1    142.
## # … with 3 more variables: Fruits_Avg_Price <dbl>, Vegetables_Avg_Price <dbl>,
## #   continent <fct>, and abbreviated variable names ¹​meal_in_restaurant,
## #   ²​water_price, ³​Milk_onelitre, ⁴​Eggs_regular, ⁵​Chicken_1kg,
## #   ⁶​Monthly_Travel_Pass, ⁷​Basic_Amenties_Bill, ⁸​Single_bedroom_rent
  • Analysing the NA values in the dataset.
#Count the number of missing (NA) values in each column of the data frame
na_counts <- colSums(is.na(dataset))


print(na_counts)
##                 city              country   meal_in_restaurant 
##                    0                    0                  428 
##          water_price        Milk_onelitre         Eggs_regular 
##                  316                  378                  507 
##          Chicken_1kg  Monthly_Travel_Pass  Basic_Amenties_Bill 
##                  558                 2166                  488 
##  Single_bedroom_rent     Fruits_Avg_Price Vegetables_Avg_Price 
##                 1363                  543                  703 
##            continent 
##                    8

if we do visualization without removing NA values it will give different visualisation for example here we are analysing the some countries with continent

selected_countries <- c("Kosovo", "Afghanistan", "Albania", "Algeria", "American Samoa")
selected_data <- dataset %>%
  filter(country %in% selected_countries)

selected_data_grouped <- selected_data %>%
  group_by(continent) %>%
  summarize(count = n())

ggplot(data = selected_data_grouped, aes(x = continent, y = count, fill = continent)) +
  geom_bar(stat = "identity") +
  xlab("Continent") +
  ylab("Count of Countries") +
  ggtitle("Count of Countries for each Continent") +
  scale_fill_manual(values = c("darkorange", "darkblue", "darkred","darkviolet"))

Here we are removing the na values in columns

dataset_without_na <- na.omit(dataset)
head(dataset_without_na)
## # A tibble: 6 × 13
##   city   country meal_…¹ water…² Milk_…³ Eggs_…⁴ Chick…⁵ Month…⁶ Basic…⁷ Singl…⁸
##   <chr>  <fct>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
## 1 Seoul  South …    7.68    0.79    2.2     4.04   10.6    42.2    182.     743.
## 2 Shang… China      5.69    0.33    2.74    2.22    4.86   28.5     66     1092.
## 3 Guang… China      4.13    0.33    1.91    1.71    3.77   28.5     59.6    533.
## 4 Mumbai India      3.68    0.19    0.75    0.95    3.69    4.91    43.6    522.
## 5 Delhi  India      4.91    0.19    0.73    1.02    3.81   11.7     58.1    230.
## 6 Dhaka  Bangla…    1.95    0.16    0.83    1.32    3.07   18.5     37.1    142.
## # … with 3 more variables: Fruits_Avg_Price <dbl>, Vegetables_Avg_Price <dbl>,
## #   continent <fct>, and abbreviated variable names ¹​meal_in_restaurant,
## #   ²​water_price, ³​Milk_onelitre, ⁴​Eggs_regular, ⁵​Chicken_1kg,
## #   ⁶​Monthly_Travel_Pass, ⁷​Basic_Amenties_Bill, ⁸​Single_bedroom_rent

here we are again analysing the country vs continent and we could see that some continents are removed since it will not add value to our data.

selected_countries <- c("Kosovo", "Afghanistan", "Albania", "Algeria", "American Samoa")
selected_data <- dataset_without_na %>%
  filter(country %in% selected_countries)

selected_data_grouped <- selected_data %>%
  group_by(continent) %>%
  summarize(count = n())

ggplot(data = selected_data_grouped, aes(x = continent, y = count, fill = continent)) +
  geom_bar(stat = "identity") +
  xlab("Continent") +
  ylab("Count of Countries") +
  ggtitle("Count of Countries for each Continent") +
  scale_fill_manual(values = c("darkorange", "darkblue", "darkred","darkviolet"))

Rechecking whether any columns have any NA values.

na_counts <- colSums(is.na(dataset_without_na))


print(na_counts)
##                 city              country   meal_in_restaurant 
##                    0                    0                    0 
##          water_price        Milk_onelitre         Eggs_regular 
##                    0                    0                    0 
##          Chicken_1kg  Monthly_Travel_Pass  Basic_Amenties_Bill 
##                    0                    0                    0 
##  Single_bedroom_rent     Fruits_Avg_Price Vegetables_Avg_Price 
##                    0                    0                    0 
##            continent 
##                    0
  • Checking for rows where the sum of the values in the row being equal to an empty string (““) is greater than zero and removing it from the dataframe
df_without_empty <- dataset_without_na %>% filter(!(rowSums(dataset_without_na == "") > 0))
head(df_without_empty)
## # A tibble: 6 × 13
##   city   country meal_…¹ water…² Milk_…³ Eggs_…⁴ Chick…⁵ Month…⁶ Basic…⁷ Singl…⁸
##   <chr>  <fct>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
## 1 Seoul  South …    7.68    0.79    2.2     4.04   10.6    42.2    182.     743.
## 2 Shang… China      5.69    0.33    2.74    2.22    4.86   28.5     66     1092.
## 3 Guang… China      4.13    0.33    1.91    1.71    3.77   28.5     59.6    533.
## 4 Mumbai India      3.68    0.19    0.75    0.95    3.69    4.91    43.6    522.
## 5 Delhi  India      4.91    0.19    0.73    1.02    3.81   11.7     58.1    230.
## 6 Dhaka  Bangla…    1.95    0.16    0.83    1.32    3.07   18.5     37.1    142.
## # … with 3 more variables: Fruits_Avg_Price <dbl>, Vegetables_Avg_Price <dbl>,
## #   continent <fct>, and abbreviated variable names ¹​meal_in_restaurant,
## #   ²​water_price, ³​Milk_onelitre, ⁴​Eggs_regular, ⁵​Chicken_1kg,
## #   ⁶​Monthly_Travel_Pass, ⁷​Basic_Amenties_Bill, ⁸​Single_bedroom_rent
  • Removing the spaces in before and after the values.
df_without_empty %>% 
  mutate_all(list(~ trimws(.)))
## # A tibble: 2,319 × 13
##    city  country meal_…¹ water…² Milk_…³ Eggs_…⁴ Chick…⁵ Month…⁶ Basic…⁷ Singl…⁸
##    <chr> <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>  
##  1 Seoul South … 7.68    0.79    2.2     4.04    10.58   42.25   182.13  742.54 
##  2 Shan… China   5.69    0.33    2.74    2.22    4.86    28.47   66      1091.93
##  3 Guan… China   4.13    0.33    1.91    1.71    3.77    28.47   59.65   533.28 
##  4 Mumb… India   3.68    0.19    0.75    0.95    3.69    4.91    43.57   522.4  
##  5 Delhi India   4.91    0.19    0.73    1.02    3.81    11.67   58.07   229.84 
##  6 Dhaka Bangla… 1.95    0.16    0.83    1.32    3.07    18.53   37.06   142.09 
##  7 Osaka Japan   7.45    0.81    1.41    1.9     6.93    88.81   128.77  674.96 
##  8 Jaka… Indone… 2.59    0.27    1.3     1.71    3.52    11.34   83.88   505.59 
##  9 Shen… China   4.27    0.34    2.23    2.13    4.37    35.59   67.63   738.75 
## 10 Kins… Congo   15.11   0.84    2       4.15    5       30      553.99  2000   
## # … with 2,309 more rows, 3 more variables: Fruits_Avg_Price <chr>,
## #   Vegetables_Avg_Price <chr>, continent <chr>, and abbreviated variable names
## #   ¹​meal_in_restaurant, ²​water_price, ³​Milk_onelitre, ⁴​Eggs_regular,
## #   ⁵​Chicken_1kg, ⁶​Monthly_Travel_Pass, ⁷​Basic_Amenties_Bill,
## #   ⁸​Single_bedroom_rent
  • Remvoing the outliers using the box plot
numeric_cols <- sapply(df_without_empty, is.numeric)
df_numeric <- df_without_empty[, numeric_cols]

for (col in names(df_numeric)) {
  boxplot(df_numeric[, col], main = col)
}

Creating function to remove the outlier.

replace_outliers <- function(x, quantile) {
  upper_bound <- quantile(x, probs = 1 - quantile)
  lower_bound <- quantile(x, probs = quantile)
  x[x > upper_bound] <- upper_bound
  x[x < lower_bound] <- lower_bound
  return(x)
}

df_without_empty_no_outliers <- df_without_empty %>% 
  mutate_if(is.numeric, funs(replace_outliers(., 0.05)))
## Warning: `funs()` was deprecated in dplyr 0.8.0.
## ℹ Please use a list of either functions or lambdas:
## 
## # Simple named list: list(mean = mean, median = median)
## 
## # Auto named with `tibble::lst()`: tibble::lst(mean, median)
## 
## # Using lambdas list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))

The boxplots represntation after removing the outliers.

numeric_cols <- sapply(df_without_empty_no_outliers, is.numeric)
df_numeric <- df_without_empty_no_outliers[, numeric_cols]

for (col in names(df_numeric)) {
  boxplot(df_numeric[, col], main = col)
}

  • Checking the dimesnions of the dataframe
dimensions <- dim(df_without_empty_no_outliers)
print(dimensions)
## [1] 2319   13
cost_of_living_cleaned_RahulBijoy <- df_without_empty_no_outliers

So I cleaned the data and the final rows is 2319 and the number of columns is 13.

PURPOSE OF DATA

The purpose of this project is to assist students and travelers in making informed decisions regarding the cost of living in different cities and countries. This data contains information on the cost of various items such as meals in a restaurant, water price, rent, and more, across cities and countries. By analyzing this data, I aim to provide a comprehensive understanding of the cost of living in different regions, thereby helping travelers and students to make informed decisions.

  • Comparison of Cost of Living across Cities and Countries: One of the key objectives of this project is to compare the cost of living across cities and countries. This will involve analyzing the cost of different items such as meals in a restaurant, water price, rent, and more, in various cities and countries. By comparing the prices of these items, I can get a comprehensive understanding of the overall cost of living in each location, and make an informed decision about the most affordable option.

  • Predicting Cost of Living in the Future: In addition to comparing the cost of living across cities and countries, I aim to make predictions about the cost of living in the future. To achieve this, I will use statistical techniques such as regression to analyze the cost of living data and make predictions about future trends. This information will be valuable for travelers who are planning trips in advance and need to make arrangements accordingly.

  • Comparing the Cost of Living across Continents: Another key aspect of this project is to compare the cost of living across different continents. By analyzing the cost of living data, I try to understand the variation in prices across continents, which will help travelers choose a destination based on their budget and priorities.

This cost of living data can be a valuable resource for students and travelers who are planning their next adventure. The information contained in this data will help them make informed decisions regarding the cost of living in different cities and countries, and provide valuable insights into the cost of living in different regions. With the help of statistical techniques such as regression, I can also make predictions about the cost of living in the future, which will be valuable for travelers who are planning their trips in advance.

DASHBOARD DESIGN AND STORY

The initial object design dashboard using the shiny app. The dashboard is dedicated to the students and travallers.

SECTION DESCRIPTION FUNCTIONS USED
1. Header The header contains the name of the dashboard, the message dropdown,
and my LinkedIn Profile.
dashboardHeader(),
title, right_ui,
tags\(li(), tags\)button(), class,
type, data-toggle, HTML(),
paste0(), tags\(span(), tags\)ul(),
menuItem(), icon(), href
2. Sidebar The sidebar contains 4 menu item buttons (General, Traveller, Student,
Data) and a Continent dropdown.
dashboardSidebar(),
sidebarMenu(), id,
menuItem(), tabName, icon,
selectInput(), choices
3. Dashboard body For each option in the sidebar, the dashboard body varies.
In general sidebar there is overview tab item is created.
For traveller sidebar option, these are the tabitems (General,Expensive options
and Cheap options).
In student sidebar option, these are the tabitems(General,
Expensive options and Cheap options).
For the Data sidebar option it shows the cleaned Rdata set
dashboardBody(), style, tabItems(), tabItem(), fluidRow(),
box(), status, solidHeader, width,
height, infoBoxOutput(), valueBoxOutput(),
textOutput(), tabBox(), tabPanel(), HTML(), ol, li,
tags\(div(), tags\)img(), src

This dashboard is designed to provide an analysis of the cost of living across different continents for both travelers and students. The data includes information about the cost of living in different countries, such as expenses like meals in restaurants, water prices, milk prices, eggs prices, chicken prices, monthly travel pass prices, basic amenities bills, single-bedroom rents, fruits and vegetable prices, and the continent where the country is located. The data consists of 2320 rows and includes countries like South Korea, China, India, Bangladesh, Congo, Thailand, Pakistan, Egypt, Brazil, Mexico, Nigeria, Russia, Japan, Philippines, and the United States.

The dashboard’s main purpose is to help individuals plan their expenses when moving to a new city or travelling to a new destination. The data can be used to compare the cost of living in different cities, regions, and continents. This information can be especially useful for people who are planning to move to a new city or country or for businesses that want to expand to new locations.

The dashboard includes four tabs: General, Traveller, Student, and Data.

The General tab provides an overview of the cost of living in different countries and continents. It includes information such as the number of countries and cities in the dataset, the most expensive and cheapest countries and cities, and the number of countries in each continent. This tab also includes a description of the elements used to calculate the living expenses in each city or country.

The Traveller tab is specifically designed for travelers and includes information on the cost of living in different cities. This tab is divided into two sections: General and Compare. The General section provides general information on the cost of living in different cities and the benefits of using this data when planning a trip. The Compare section allows users to compare the cost of living in up to four different cities.

The Student tab is specifically designed for students and includes information on the cost of living in different countries. This tab is divided into two sections: General and Compare. The General section provides general information on the cost of living in different countries and the benefits of using this data when planning to study abroad. The Compare section allows users to compare the cost of living in up to four different countries.The Data tab provides access to the cleaned Rdata used in this dashboard.Overall, this dashboard provides users with valuable information on the cost of living in different countries and continents. By using this data, individuals can plan their expenses, choose a destination that fits their budget and preferences, and make informed decisions when moving to a new city or country. This is the final dashboard which we can access from the Shiny apps

https://rahuleldho.shinyapps.io/Rfirstassignment_Rahul_Bijoy/

CONCLUSION

As a student, I believe that learning R can be an excellent way to gain skills in data analysis and visualization. Shiny, a web application framework for R, provides an interactive platform to create dashboards and data visualizations, which can be powerful tools for analyzing and presenting data. By building a shiny app to analyze living costs, I can not only enhance their proficiency in R and shiny, but also gain valuable skills in data visualization and communication. This project provides an excellent opportunity to explore data and gain insights into global living costs. With data becoming increasingly essential in many fields, it is becoming vital to have skills in data analysis and presentation. I also need to further develop my knowledge since I faced a lot of challenges and I still remember your words in the first class to try it again.